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Abstract 

We consider the External Clock Synchronization problem in dynamic sensor networks. Initially, sensors 
obtain inaccurate estimations of an external time reference and subsequently collaborate in order to synchro¬ 
nize their internal clocks with the external time. For simplicity, we adopt the drift-free assumption, where 
internal clocks are assumed to tick at the same pace. Hence, the problem is reduced to an estimation problem, 
in which the sensors need to estimate the initial external time. This work is further relevant to the problem of 
collective approximation of environmental values by biological groups. 

Unlike most works on clock synchronization that assume static networks, this paper focuses on an extreme 
case of highly dynamic networks. Specifically, we assume a non-adaptive scheduler adversary that dictates in 
advance an arbitrary, yet independent, meeting pattern. Such meeting patterns fit, for example, with short-time 
scenarios in highly dynamic settings, where each sensor interacts with only few other arbitrary sensors. 

We propose an extremely simple clock synchronization algorithm that is based on weighted averages, and 
prove that its performance on any given independent meeting pattern is highly competitive with that of the 
best possible algorithm, which operates without any resource or computational restrictions, and knows the 
meeting pattern in advance. In particular, when all distributions involved are Gaussian, the performances of 
our scheme coincide with the optimal performances. Our proofs rely on an extensive use of the concept of 
Fisher information. We use the Cramer-Rao bound and our definition of a Fisher Channel Capacity to quan¬ 
tify information flows and to obtain lower bounds on collective performance. This opens the door for further 
rigorous quantifications of information flows within collaborative sensors. 
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1 Introduction 

1.1 Background and Motivation 


Representing and communicating information is a main interest of theoretical distributed computing 1351. How¬ 
ever, such studies often seem disjoint from what may be the largest body of work regarding coding and com¬ 
munication: Information theory l8l|39l. Perhaps the main reason for this stems from the fact that theoretical 
distributed computing studies are traditionally concerned with noiseless models of communication, in which the 
content of a message that passes from one node to another is not distorted. This reliability in transmission relies 
on an implicit assumption that error-corrections is guaranteed by a lower level protocol that is responsible for 
implementing communication. Indeed, when bandwidth is sufficiently large, one can encode a message with a 
large number of error-correcting bits in a way that makes communication noise practically a non-issue. 


In some distributed scenarios, however, distortion in communication is unavoidable. One example concerns 
the classical problem of clock synchronization, which has attracted a lot of attention from both theoreticians in 
distributed computing ll2l26ll29ll34ll4Tll . as well as practitioner engineers Ir7]llllll2lll4lll8ll37ll40ll . see | 


for comprehensive surveys. In this problem, processors need to synchronize their internal clocks (either among 
themselves only or with respect to a global time reference) relying on relative time measurements between clocks. 
Due to unavoidable unknown delays in communication, such measurements are inherently noisy. Furthermore, 
since the source of the noise is the delays, error-correction does not seem to be of any use for reducing the noise. 
The situation becomes even more complex when processors are mobile, preventing them from reducing errors by 
averaging repeated measurements to the same processors, and from contacting reliable processors. Indeed, the 
clock synchronization problem is particularly challenging in the context of wireless sensor networks and ad hoc 
networks which are typically formed by autonomous, and often mobile, sensors without central control. 


Distributed computing models which include noisy communication call for a rigorous comprehensive study 
that employs information theoretical tools. Indeed, a recent trend in the engineering community is to view the 
clock synchronization problem from a signal processing point of view, and adopt tools from information theory 
(e.g., the Cramer-Rao bound) to bound the affect/impact of inherent noise ||6l|7l|T8l|251, see 1461 for a survey. 
However, this perspective has hardly received any attention by theoreticians in distributed computing that mostly 
focused on worst case message delays ||2l|4l|26l|29l, which do not seem to be suitable for information theoretic 
considerations. In fact, very few works on clock synchr'onisation consider a system with random delays and 
analyse it following a rigorous theoretical distributed algorithmic type of analysis. An exception to that is the 
work of Lenzen el al. |[27l . but also that work does not involve information theory. In this current paper, we 
study the clock synchronization problem through the purely theoretical distributed algorithmic perspective while 
adopting the signal processing and information theoretic point of view. In particular, we adopt tools from Fisher 
Information theory II421I47L 

We consider the external version of the problem l|9l[l2l[l4l[32l|34l|44l in which processors (referred to as sen¬ 
sors hereafter) collaborate in order to synchronize their clocks with an external global clock. Informally, sensors 
initially obtain inaccurate estimate^i] of a global (external) time r* G M reference, and subsequently collaborate 
to align their internal clocks to be as close as possible to the external clock. To this end, sensors communicate 
through uni-directional pairwise interactions that include inherently noisy measurements of the relative devia¬ 
tion between their internal clocks and, possibly, some complementary information. To focus on the problems 
occurred by the initial inaccurate estimations of r* and the noise in the communication we restrict our attention 
to drift-free settings HJI^, in which all clocks tick at the same rate. This setting essentially reduces the problem 


'Traditional protocols like NTP ISTI and TEMPO fTTl use an external standard like GPS (Global Positioning System) or UTC (Uni¬ 
versal Time) to synchronize networks. However, the use of of such systems poses a high demand for energy which is usually undesired 
in sensor networks. Hence, works in sensor networks typically assume that one source processor obtains an accurate estimate of the 
global time reference and essentially governs the synchronization of the rest of the sensors OH. Here, we generalize this framework by 
assuming that each processor may initially have a different estimate quality of the global time reference, and our goal is to investigate 
what can be achieved given the qualities of initial estimations. 
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to the problem of estimating r*. See, e.g., 1116114311^ for works on estimation in the engineering community. 

With very few exceptions that effectively deal with dynamic settings l[T0ll2^ . almost all works on clock 
synchronization (and estimation) considered static networks. Indeed, the construction of efficient clock synchr'o- 
nization algorithms for dynamic networks is considered as a very important and challenging tasl|l 1138114411 . This 
paper addresses this challenge by considering highly dynamic networks in which sensors have little or no control 
on who they interact with. Specifically, we assume a non-adaptive scheduler adversary that dictates in advance a 
meeting-pattern for the sensors. However, the adversary we assume is not unlimited. Specifically, for simplicity, 
in this initial work we restrict the adversary to provide independent-meeting patterns only, in which it is guaran¬ 
teed that whenever a sensor views another sensor, their transitive histories are disjoin^. Although they are not 
very good representatives of communication in static networks, independent meeting patterns fit well with highly 
stochastic communication patterns during short-time scales, in which each sensor observes only few other arbi¬ 
trary sensors (see more discussion in Section l2n) . Given such an adversarial meeting-pattern, we are concerned 
with minimizing the deviation of each internal clock from the global time. 

As our objective is to model small and simple sensors, we are interested in algorithms that employ elementary 
internal computations and economic use of communication. We use competitive analysis to evaluate the perfor¬ 
mances of algorithms, comparing them to the best possible algorithm that knows the whole meeting pattern in 
advance and operates under the most liberal version of the model that allows for unrestricted resources in terms 
of memory and communication capacities, and individual computational ability. 

1.2 Our contribution 

Lower bounds on optimal performance. We first consider algorithm Opt, the best possible algorithm op¬ 
erating on the given independent meeting pattern. We note that specifying Opt seems challenging, especially 
since we do not assume a prior distribution on the starting global time, and hence the use of Bayesian statistics 
seems difficult. Fortunately, for our purposes, we are merely interested in lower bounding the performances on 
that algorithm. We achieved that by relating the smallest possible variance of a sensor at a given time to the 
largest possible Fisher Information (FI) of the sensor at that time. This measure quantifies the sensor’s current 
knowledge regarding the relative deviation between its local time and the global time. We provide a recursive 
formula to calculate J^, the FI at sensor a, for any sensor a. Specifically, initially, the FI at a sensor is the Fisher 
information in the distribution family governing its initial deviation from the global time (see Section [Z4l for the 
formal definitions). When sensor a observes sensor b, the FI at a after this observation (denoted by J') satisfies: 

Ja< Ja + j_ j_ ! (1) 

Jb Jn 

where Jjv is fhe Fisher Information in the noise distribution related to the observation. To obtain this formula we 
prove a generalized version of the Fisher information inequality II421I47II . Relying on the Cramer-Rao bound HI, 
this formula is then used to bound the corresponding variance under algorithm Opt. Specifically, the variance of 
the internal clock of sensor a is at least 1/ Ja- 

Equation[T]provides immediate bounds on the convergence time. Specifically, the inequality sets abound of Jn 
for the increase in the FI per interaction. In analogy to Channel Capacity as defined by Shannon IH we term this 
upper bound as the Fisher Channel Capacity. Given small e > 0, we define the convergence time T(e) as the 
minimal number of observations required by the typical sensor until its variance drops below (see Section |2^ 
for the formal definition). Let Jq denote the median initial Fisher Information of sensors. Based on the Fisher 
Channel Capacity we prove the following. 

^For example, dynamic meeting patterns prevent the use of classical external clock synchronization algorithms (e.g., I31II34I ) that are 
based on one or few source sensors that obtain accurate estimation of the global time and govern the synchronization of other sensors. 

’’Another informal way to view such patterns is that they guarantee that, given the global time, whenever a sensor views another sensor, 
their local clocks are independent; see Section lTTI for a formal definition. 
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Theorem 1.1, Assume that Jq <C Xjfor some small e > 0, then T(e) > (^ — Jq)/Jn ~ X/e^JN. 

A highly competitive elementary algorithm. We propose a simple clock synchronization algorithm and prove that 
its performance on any given independent meeting pattern is highly competitive with that of the optimal one. That 
is, estimations of global time at each sensor remain unbiased throughout the execution and the variance at any 
given time is Aq- competitive with the best possible variance, where Aq is initial Fisher-tightness (see definition 
in Section 1231 In contrast to the optimal algorithm that may be based on transmitting complex functions in each 
interaction, and on performing complex internal computations, our simple algorithm is based on far more basic 
rules. First, transmission is restricted to a single accuracy parameter. Second, using the noisy measurement of 
deviation from the observed sensor, and the accuracy of that sensor, the observing sensor updates its internal 
clock and accuracy parameter by careful, yet elementary, weighted-averaging procedures. 

Our weighted-average algorithm is designed to maximize the flow of Fisher information in interactions. This 
is proved by showing that the accuracy parameter is, at all times, both representative of the reciprocal of the 
sensor’s variance and close to the Fisher Information upper bound. In short, we proved the following. 

Theorem 1.2. There exists a simple weighted-average based clock synchronization algorithm which is Aq- 
competitive (at any sensor and at any time). 

Two important corollaries of Theorem 11.21 follow directly from the definition of the initial Fisher-tightness 
Aq- 

Corollary 1.3, If the number of distributions involved is a constant (independent of the number of sensors), then 
our algorithm is 0{X)-competitive (at any sensor and at any time). 

Corollary 1.4. If all distributions involved are Gaussians, then the performances of our algorithm (in terms of 
the variances) coincide with the optimal one, for each sensor and at any time. 

We note that our algorithm does not require the use of sensor identities and can thus be also employed in 
anonymous networks HKm, yielding the same performances. 

2 Preliminaries 
2.1 The Model 

We consider a collection of n sensors that collaborate in order to synchronize their internal clocks with an external 
global clock reference. We consider a set (F of sufficiently smooth (see definition in Section 12.41) . probability 
density distributions (pdf) centered at zero. One specific disfribufion among fhe pdfs in T is fhe noise disfribufion, 
referred fo as N{r]). Each sensor a is associafed wifh a disfribufion G (F which governs fhe initialization 

deviafion of ifs infernal clock from fhe global fime as described in fhe nexf paragraph. Depending on fhe specific 
model, we assume fhaf sensor a knows various properties of <I>a. In fhe mosf restricfed version we consider, 
sensor a knows only fhe variance of and in fhe mosf liberal version, a knows fhe full description of 
Execution is inifiafed when fhe global fime is some r* G M, chosen by an adversary. 

Two imporfanf cases are (1) when (F confains a consfanf number of disfribufions (independenf of fhe number 
of sensors) and (2) when all disfribufions in F are Gaussian. Bofh cases serve as reasonable assumpfions for 
realistic scenarios. Eor fhe former case we shall show asympfofically opfimal performances and for fhe laffer 
case we will show sfricf opfimal (non-asympfofical) performance. 

Local clocks. Each sensor a is initialized wifh a local clock £a(0) G M, randomly chosen according fo — 
T*), independenfly of all ofher sensors. Thaf is, as is cenfred around zero, fhe inifial local fime ^a(O) is 

disfribufed around r*, and fhis disfribufion is governed by <ha. We sfress fhaf sensor a does nof know fhe value 
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T* and from its own local perspective the execution started at time ^a(O). Sensors rely on both social interactions 
and further environmental cue^ to improve their estimates of the global time. In between such events sensors 
are free to perform “shift” operations to adjust their local clocks. To focus on the problems occurred by the 
initial inaccurate estimations of r* and the noise in the communication we restrict our attention to drift-free 
settings 121123 , in which all clocks tick at the same rate, consistent with the global time. 

Opinions. The drift-free assumption reduces the external clock-synchronization problem to the problem of 
estimating r*. Indeed, recall that local clocks are initialized to different values but progress at the same rate. 
Because sensor a can keep the precise time since the beginning of the execution, its deviation from the global 
time can be corrected had it known the difference between, fa(0), the initial local clock of a, and t* , the global 
time when the execution started. Hence, one can view the goal of sensor a as estimating t*. That is, without 
loss of generality, we may assume that all shifts performed by sensor a throughout the execution are shifts of its 
initial position ia{0) aiming to align it to be as close as possible to r*. Taking this perspective, we associate with 
each sensor an opinion variable Xa, initialized to Xa(0) := ^a(O), and the goal of a is to have its opinion be as 
close as possible to r*. We view the opinion Xa as an estimator of r*, and note that initially, due to the properties 
of <l>(j, this estimator is unbiased, i.e., mean(xa(0) — r*) = 0. It is required that at any point in the execution, the 
opinion Xa remains an unbiased estimator of r*, and the goal of o is to minimize its Mean Square Error (MSE). 

Due to this simple relation between internal clocks and opinions, in the remaining of this paper, we shall 
adopt the latter perspective and concern ourselves only with optimizing the opinions of sensors as estimators for 
T*, without discussing further the internal clocks. 

Rounds. Eor simplicity of presentation, we assume that the execution proceeds in discrete steps, or rounds. 
We stress however that the rounds represent the order in which communication events occur (as determined by 
the meeting-pattern, see below), and do not necessarily correspond to the actual time. Given an algorithm A, 
the opinion maintained by the algorithm at round t (where f is a non-negative integer) at sensor a is denoted by 
Xa{t, A). As mentioned, the algorithm aims to keep this value as close as possible to r*. When A is clear form 
the context, we may omit writing it and use the term Xa{t) instead. 

In each round t > 1, each sensor may first choose to shift (or not) its opinion, and then, if specified in fhe 
meefing paffern, if observes anofher specified sensor, fhus obfaining some information. To summarize, in each 
round, a sensor execufes fhe following consecufive actions: (1) Perform infernal compufafion; (2) Perform an 
opinion-shiff: Xa{t) = Xa{t — 1) A(x); and (3) Observe (or nol) anofher sensor. Eor simplicify, all fhese fhree 

operations are assumed fo occur insfanfaneously, fhaf is, in zero time. 

Mobility and adversarial independent meeting patterns. In cases where sensors are embedded in a Euclidian 
space, distances between positioning of sensors may impact the possible interactions. To account for physical 
mobility, and be as general as possible, we assume that an oblivious adversary controls the meeting pattern. That 
is, the adversary decides (before the execution starts), for each round, which sensor observes which other sensor. 

A model that includes an unlimited adversary that controls the meeting pattern is in some sense too generaH. 
In this preliminary work on the subject, we restrict the adversary to provide only independent meeting patterns, in 
which the set of sensors in the transitive history of each observing sensor is disjoint from the one of the observed 
sensor. As indicated by this work, the case of independent meeting patterns is already complex. We leave it to 
future work to handle dependent meeting patterns. 

"’ll! order for the model to include environmental cues, one or more of the sensors can he taken to represent the global clock. The 
initial times of these sensors are chosen according to highly concentrated distributions, "Ta, around r* and remain fixed thereafter. 

^For example, in our model we assume that sensors are anonymous but we compare such algorithm to the best possible algorithm 
that knows the identities of sensors and the whole meeting pattern in advance. In such case, an arbitrary interaction pattern can match all 
sensors such that interactions occur only within pairs. As the sensors are anonymous, they cannot distinguish this case from other, more 
uniform, meeting patterns and hence, cannot be expected to act as efficiently as algorithms with identified sensors. Some limitations on 
the adversarial interaction network are therefore required. 
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Formally, given a pattern of meetings V, sensor a and round t, we first define the set of relevant sensors 
of o at time t, denoted by lZa{t,V). At time zero, we define TZa{0,V) := {o}, and at round t, TZait,V) := 
TZa{t — 1; T’) U TZ{b, t — 1,V) if a observes b at time t — 1 (otherwise TZa{t, V) := Tla{t — 1, V)). A meeting 
pattern V is called independent if whenever some sensor a observes a sensor b at some time t, then TZa {t — l,V)ri 
TZ{b, t — l,V) = 0 . Note that an independent meeting pattern guarantees that given r*, the internal clocks of two 
interacting sensors are independent. However, given r* and the internal clock of a, the internal clock of b and the 
relative time measurement between them are dependent (this point is explained in further details in Section [S]). 

Note that independent-meeting patterns are not very good representatives of communication in static net¬ 
work^ On the other hand, independent meeting patterns fit well with highly stochastic short-time scales com¬ 
munication patterns, in which each sensor observes only few other arbitrary sensors. In this sense, such patterns 
can be considered as representing an extreme case of dynamic systems. 

Because sensors have no control of when their next interaction will occur, or if it will occur at all, we require 
that estimates at each sensor be as accurate as possible at any point in time. This requirement is stronger than the 
liveness property that is typically required from distributed algorithms ll2^ . 

Convergence time. Consider a meeting pattern V. Given small e > 0, the convergence time T{e) of an 
algorithm A is defined as the minimal number of observations made by the typical sensor until its variance is less 
than e^. More formally, let p denote the first round when we have more than half of the population satisfying 
\w{Xa{t, A)) < e^. For each sensor a, let R{a) denote the number of observations made by a until time p. The 
convergence time T(e) is defined as the median of R{a) over all sensors a. Note that T{e) is a lower bound on p, 
since p > R{a) for every sensor o. 

Communication. We assume that sensors are anonymous and hence, in particular, they do not know who they 
observe. Conversely, for the sake of lower bounds, we allow a much more liberal setting, in which sensors have 
unique identifiers and know who they interact with. 

When a sensor a observes another sensor b at some round t, the information transferred in this interaction 
contains a passive component and, possibly, a complementary active one. The passive component is a noisy 
relative deviation measurement between their opinions: 

dab{t) = Xb{t) - Xa{t) + p, 

where the additive noise term, p, is chosen from the noise probability distribution N{p) € IF whose variance is 
known to the sensors. (Note that this measurement is equivalent to the relative deviation measurement between 
the sensors’ current local times because all clocks tick at the same pace.) 

2.2 Elementary algorithms 

Our reference for evaluating performances is algorithm Opt which operates under the most liberal version of our 
model, which carries no restrictions on memory, communication capacities or internal computational power, and 
provides the best possible estimators at any sensor and at any time (we further assume that sensors acting under 
Opt know the meeting pattern in advance). In general, algorithm Opt may use complex calculations over very 
wasteful memories that include detailed distribution density functions, and possibly, accumulated measurements. 
Our main goal is to identify an algorithm whose performance is highly competitive with that of Opt but wherein 
communication and memory are economically used, and the local computations simple. Indeed, when it comes 

^Indeed, in such patterns a sensor will not contact the same sensor twice, which contradicts many natural communication schemes 
in static networks. We note, however, that in some cases, a sequence of multiple consecutive observations between sensors can be 
compressed into a single observation of higher accuracy thus reducing the dependencies between observations, and possibly converting a 
dependent meeting pattern into an independent one. For example, if sensors have unique identities and sensor a observes sensor b several 
times is a row, and it is guaranteed that sensor b did not change its state during these observations, then these observations can be treated 
by a as a single, more accurate, observation of b. 
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to applications to tiny and limited processors, simplicity and economic use of communication are crucial restric¬ 
tions. 

An algorithm is called elementary if the internal state of each sensor a is some rea0 number G M, and, 
more importantly, the internal computations that a sensor can perform consist of a constant number of basic 
arithmetic operations, namely: addition, subtraction, multiplication, and division. 

2.3 Competitive analysis 

Fix a finite family IF of smooth pdf’s centered at zero (see the definition for smoothness in the next paragraph), 
and fix an assignment of a distribution G to each sensor a. For an algorithm A and an independent 
meeting pattern V, let Xa{t, A,V) denote the random variable indicating the opinion of sensor a at round t. 
Let mean(Aa(t, A, V)) and var(A'a(f, A, V)) denote, respectively, the mean and variance of Xa{t, A, V), where 
these are taken over all possible random initial opinions, communication errors, and possibly, coins flipped by 
fhe algorithm. Note that the unbiased assumption requires that m&w{Xa{t, A,V)) = r*. An algorithm A is 
called A-competitive, if for any independent pattern of meetings V, any sensor a, and at any time t, we have: 
var(Xa(f, A, V)) < X- \ar{Xa{t, Opt, V)). 

2.4 Fisher information and the Cramer-Rao bound 


The Fisher information is a standard way of evaluating the amount of information that a set of random measure¬ 
ments holds about an unknown parameter r of the distribution from which these measurements were taken. We 
provide here some definitions for this notion; for more information the reader may refer to |[8ll47l. 

A single variable probability distribution function (pdf) <1> is called smooth if it satisfies the following con¬ 
ditions, as stated by Stam 1421 : (1) 4>(x) > 0 for any x G M, (2) the derivative <!>' exists, and (3) the integral 
/ exists, i.e., ^'{y) 0 rapidly enough for \y\ —>• oo. Note that, in particular, these conditions 

hold for natural distributions such as the Gaussian distribution. Recall that we consider a finite set X of smooth 
one variable pdfs, one of them being the noise distribution N{r]), and all of which are centered at zero. 

For a smooth pdf let := / ^^{^'{y)ydy denote the Fisher information in the parameterized family 
{(4>(x, r)}T-eK = {(‘h(x — T)}reR with respect to r. In particular, let Jn = denote the Fisher information 
in the parameterized family {N{p — r)}T-gR. More generally, consider a multi-variable pdf family {(^(zi — 
T, Z 2 ■ ■ ■ Zfc))}reK where r is a translation parameter. The Fisher information in this family with respect to r is 
defined as: 


f 1 

■ d^{zi - T, Z2 . . . Zk) ' 

/ T>(Z1 -T,Z2...Zk) 

dr 


dzi, dz2 ■ ■ ■ dzk 


(if the integral exists) 


As previously noted HTl . since r is a translation parameter, Fisher information is both unique (there is no freedom 
in choosing the parametrization) and independent of r. 

The Fisher information derives its importance by association with the Cramer-Rao inequality IH. This in¬ 
equality lower bounds the variance of the best possible estimator of r* by the reciprocal of the Fisher information 
that corresponds to the random variables on which this estimator is based. 

Theorem 2.1. [The Cramer-Rao inequality] Let X be any unbiased estimator of t* G M which is based on a 
multi-variable sample z = (zi, Z 2 ■ ■ ■ Zk) taken from ^{zi — t* ,Z 2 ■ ■ ■ Zk)- Then var(X) > 1/JJ. 


Initial Fisher-tightness : To define the initial Fisher-tightness parameter Aq, we first define the Fisher-tightness 
of a single variable smooth distribution centered at zero, as A($) = var(<l>) ■ JJ . Note that, by the Cramer-Rao 
bound, A(<h) > 1 for any such distribution 4>. Moreover, equality holds if <h is Gaussian (81. Recall that F is 

’We assume real numbers for simplicity. It seems reasonable to assume that when sufficiently accurate approximation is stored instead 
of the real numbers similar results could be obtained. 
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the finite collection of the smooth distributions containing the distributions <l>a governing the initial opinions of 
sensors. The initial Fisher-tightness Aq is the maximum of the Fisher-tightness over all distributions in T and 
the noise distribution. Specifically, lef Aq = max{A($) | $ G J^}. Two imporfanf observafions are: 

• \i F confains a consfanf number of disfribufions fhen Aq is a consfanf. 

• If fhe disfribufions in F are all Gaussians fhen Aq = 1. 

3 Technical difficulties 

If is known for a single sensor, one can associate weighfs fo samples so fhaf a weighfed-average procedure 
can fuse fhem opfimally [SOl. The proof fherein relies on fhe assumption fhaf all probabilify disfribufions are 
Gaussians whose funcfional forms, indeed, depend on fheir second momenfs only. Our selling is more complex, 
since if includes arbifrary differentiable pdf's and mulliple dislribuled sensors whose relative opinions consfanlly 
change. 

The extension lo multiple mobile sensors adds anolher dimension lo fhe problem. One difficully lies in fhe 
facl fhaf fhe partial knowledge held by each sensor is relalive {e.g., an eslimalion of fhe devialion belween fhe 
sensor’s opinion and r*) and hence may require fhe sensors fo carefully fuse perspeclives olher lhan fheir own. 
This difficully is enhanced, as fhe sensors consfanlly shiff fheir opinions. Indeed, for elemenfary algorilhms, 
where memory is reslricled fo a single pai'amefer, storing fhe sum of previous shifls in fhe memory of a sensor 
is possible, bul could draslically limil fhe degrees of freedom for encoding olher information. On fhe olher hand, 
wilhoul encoding previous shifls, if is nol clear how sensor a should Ireal informalion if had received from b. 

In addition, compression of memory and communication appears to be detrimental. Indeed, maintaining 
and communicating highly detailed memories can, in some cases, significantly improve a sensor’s assessment 
of the target value. However, maintaining a high degree of detail requires storing an arbitrary number of pdf 
moments which may grow with every interaction. Hence, it is not clear how to compress the information into 
few meaningful parameters while avoiding the accumulation of errors and runaway behavior. 

Several technical difficulties arise when attempting to bound the performances of different algorithms. In nat¬ 
ural type of algorithms, sensors’ memories can be regarded as maintaining pdfs that summarize their knowledge 
regarding their deviation from the target value r*. One of the analysis difficulties corresponds to the fact that the 
pdf held by a sensor at round t depends on many previous deviation measurements in a non-trivial way, and hence 
the variance of a realization of the pdf does not necessarily correspond to the variance of the sensors’ opinion, 
when taking into account all possible realizations of all measurements. Hence, one must regard each pdf as a 
multi-variable distribution. A second problem has to do with dependencies. The independent meeting pattern 
guarantees that the memory pdf’s of two interacting sensors are independent, yet, given the pdf of the observing 
sensor, the pdf of the observed sensor and the deviation measurement become dependent. Such dependencies 
make it difficult to track the evolution of a sensor’s accuracy of estimation over time. Indeed, to tackle this issue, 
we had to extend the Fisher information inequality 1136114211^ to a multi-variable dependent convolution case. 

4 Lower bounds on the variance of Opt 

In this section we provide lower bounds on the performances of algorithm Opt over a fixed independenf paffern 
of meefings V. Note fhaf we are interested in bounding fhe performances of Opt and nof in specifying its 
instructions. Identifying the details of Opt may still be of interest, but it is beyond the scope of this paper. 

For simplicity of presentation, we assume that the rules of Opt are deterministic. We note, however, that our 
results can easily be extended to the case that Opt is probabilistic. For simplicity of notations, since this section 
deals only with algorithm Opt acting over V, we use variables, such as the opinion Xa{t) and the memory Ya{t) 
of sensor a, without parametrizing them by neither Opt nor by V. 
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Under algorithm Opt, we assume that each sensor holds initially not only the variance of but the precise 
functional form of the distribution (recall, $0 is centered at zero). In addition, we assume that sensors have 
unique identifiers and that each sensor knows the whole pattern V in advance. Moreover, we assume that each 
sensor a knows for each other sensor b, the pdf governing 6’s initial opinion. All this information is stored in 
one designated part of the memory of a. 

Since Opt does not have any bandwidth constrains, we may assume, without loss of generality, that whenever 
some sensor a observes another sensor b, it obtains the whole memory content of b. Since Opt is deterministic, its 
previous opinion-shifts can be extracted from its interaction history, which is, without loss of generality, encoded 
in its memorjU. Hence, when sensor a observes sensor b at some round t, and receives 6’s memory together with 
the noisy measurement dab{t) = X}y{t) — Xa{t) + rj, sensor o may extract all previous opinion-shifts of both itself 
and b, treating the measurement dab{t) as a noisy measurement of the deviation between the initial opinions, i.e., 
dafe(O) = X6(0) — a:a(0) -h p- In other words, to understand the behavior of Opt at round t, one may assume that 
sensors never shift their opinions until round t, when they use all memory they gathered to shift their opinion 
in the best possible manneiB It follows that apart from the designated memory part that all sensors share, the 
memory Ma{t) of sensor a at round t contains the initial opinion Xa(0) and a collection Ya{t — 1) := {dbc{0)}bc 
of relative deviation measurements between initial opinions. That is, Ma{t) = {XQ{t),Ya(t — 1)). This multi¬ 
valued memory variable Ma {t) contains all the information available to a at round t. In turn, this information is 
used by the sensor to obtain its opinion Xa{t) which is required to serve as an unbiased estimator of r*. 

4.1 The Fisher Information of sensors 

We now define the notion of the Fisher Information associated with a sensor a at round t. This definition will be 
used to bound from below the variance of Xa{t) under algorithm Opt. 

Consider the multi-valued memory variable Ma{t) = {Xo{t),Ya{t — 1)) of sensor a that at round t. Note that 
— 1) is independent of r*. Indeed, once the adversary decides on the value r*, all sensors’ initial opinions are 
chosen with respect to r*. Hence, since sensors’ memories contains only relative deviations between opinions, 
the memories by themselves do not contain any information regarding r*. In contrast, given r*, the random 
variables Ya{t — 1) and Xa(0) are, in general, dependent. Furthermore, in contrast to Ya{t — 1), the value of 
Aa(0) depends on r*, as it is chosen according to <ha(a; — r*). Hence, Ma{t) is distributed according to a 
pdf family {{ma{t),T)} parameterized by a translation parameter r. Based on Ma{t), the sensor produces an 
unbiased estimation Xa{t) of t* , that is, it should hold that: mean(Aa(f) — r*) = 0, where the mean is taken 
with respect to the distribution of the random multi-variable Ma{t). 

Definition: The Fisher Information (FI) of sensor a at round t, termed Ja{t), is the the Fisher information in the 
parameterized family {(ma(f), r)}rG]R with respect to r. 

By the Cramer-Rao bound, the variance of any unbiased estimator used by the sensor a at round t is bounded 
from below by the reciprocal of the FI of sensor o at that time. That is, we have: 

Lemma 4.1. var(A'a(f)) > 1/Ja{t). 

4.2 An upper bound on the Fisher Information Ja{t) 

Lemma 14.11 implies that lower bounds on the variance of the opinion of a sensor can be obtained by bounding 
from above the corresponding FI. To this end, our next goal is to prove the following recursive inequality. 

*In case Opt is probabilistic, previous shifts can be extracted from the memory plus the results of coin flips which may be encoded in 
the memory of the sensor as well. 

® This observation implies, in particular, that previous opinion-shifts of sensors do not affect subsequent estimators in a way that may 
cause a conflict (a conflict may arise, e.g., when optimizing one sensor at one time necessarily makes estimators at another sensor, at a 
later time, sub-optimal), hence algorithm Opt is well-defined. 



Theorem 4.2. The FI of sensor a under algorithm Otpi satisfies: Ja{t + 1) < Ja{t) + ^/(j’ 

Proof. Consider the case that at round t, sensor a observes sensor b. After the interaction, the random multi- 
variable Ya{t) is composed of: (1) the random variable Dab{0) ■= — Aia(O) + N, corresponding to the 

noisy deviation measurement between the initial opinions of a and b, and (2) the relative deviation measurements 
in both Ya{t — 1) and Yb{t — 1). We now aim at calculating the FI Ja{t + 1) available to sensor a at time 
f + 1, with respect to the parameter r. This is the FI with respect to r, in the multi-variable Ma{t -|- 1) = 
{Xa{0),Ya{t)) = {Xa{0),pab{0),Ya{t - l),Yb{t - 1)), where Xa(0) is distributed according to - t*). 
Taking Xf,(0) = Aia(O) -|-i2a6(0) = Xb{0) + N, this latter FI becomes the same as the FI in the random variables: 
(Aa(0), Xb(0), ya(f — 1)) Yb{t — 1)). Since the meeting pattern is independent, then given the environment value 
T*, the random multi-variable {Xaf)),Ya{t — 1)) is independent of the random multi-variable (W6(0), Yf,(f — 1)). 
By the additivity property of the Fisher information with respect to independent random multi-variables (see ||42||), 
the FI Ja{t + 1) therefore equals the FI Ja{t) (which is the FI in the random multi-variable (2fa(0), ya(f ~ 1))) 
plus the FI Jb{t) in the random multi-variable (Xb(0), Yb{t — 1)), both with respect to r. That is, we have: 

Ja{t + 1 ) = Jait) + Jb(f)- ( 2 ) 

Let us now focus on the rightmost term in Equation [2] and calculate Jb{t). Given that the target value is some r, 
the distribution of {Xb{0)^Yb{t — 1)) can be described by the following convolution: 

fxt{ 0 ),Yb{t-i) [i^b{0),ybit -1))\t]= J fx,{ 0 ),Yb{t-i) [^6(0) - V, Vbit - 1) I r] N{r]) dy. (3) 

Observe that the right hand side of Equation[3]is a convolution of the distribution of (2ff,(0), Yb{t — 1)) with the 
noise distribution N, where the convolution occurs with respect to the random variable 2fb(0). Our goal now is 
to bound the Eisher information in this convolution with respect to r. 

The Eisher information inequality II421I47I1 bounds the Eisher Information of convolutions of single-variable 
distributions. Essentially, the theorem says that if x, y and r are real values, K{x — r), R{x — r) and Q(x — t) are 
parameterized families and K = R <S> Q, then J(K) < l/( + 7 (^)- To apply this inequality to Equation [3l 

we generalize it to distribution with multiple variables, where only one of them is convoluted. We rely on the fact 
that the random variable Yb{t—1) does not depend on r* (recall, it contains only relative deviation measurements). 
This fact turns out to be sufficient to overcome the potential complication rising from the fact that given the envi¬ 
ronmental value T*, the random vaiiable ^^(O) and the random multi-variable Yfe(f — 1) are no longer indepen¬ 
dent. In Appendix|A]we prove Eemma I aTT] which extends the Eisher information inequality to our multi-variable 
(possibly dependent) convolution case, enabling to prove the inequality Jb{t) < l/( Together with 

Equation 121 we obtain the required recursive inequality for the El. This completes the proof of the theorem. □ 


5 A highly-competitive elementary algorithm 

We define an elemenfaryelemenfary algorifhm, fermed ALG, and prove fhaf ifs performances are highly-compefifive 
wifh fhose of Opt. In fhis algorifhm, each sensor a stores in ifs memory a single parameter Cq G M fhaf represenfs 
ifs accuracy regarding fhe qualify of ifs currenf opinion wifh respecf to r*. The initial accuracy of sensor a is sef 
to Ca(0) = l/var(<I>a). When sensor a observes sensor b af some round t, if receives Cb{t) and dab{t), and acfs as 
follows. Sensor a firsf computes fhe value Cb{t) = Cb(t) /{I + Cb{t) ■ var(A)), a reduced accuracy parameter for 
sensor b fhaf lakes measuremenl noise info accounl, and Ihen proceeds as follows: 
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Algorithm ALG 

• Update opinion: 3:a(i + 1) = Xait) + t2(t)+Zw ' 

• Update accuracy : Ca{t + 1) = Ca{t) + Cb{t). 


Fix an independent meeting pattern. First, algorithm ALG is designed such that at all times, the opinion is 
preserved as an unbiased estimator of r* and the accuracy, Ca{t), remains equal to the reciprocal of the current 
variance of the opinion Xa{t, ALG). Indeed, the following lemma is proven in Appendix iBl 

Lemma 5.1. At any round t and for any sensor a, we have: (1) the opinion Xa {t, ALG) serves as an unbiased 
estimator of T*, and (2) Ca{t) = l/var(Xa(t, ALG)). 

We are now ready to analyze the competitiveness of algorithm ALG, by relating the variance of a sensor a 
at round t to the corresponding FI, namely, Ja{t). Recall that Lemma Iddl gives a lower bound on the variance 
of algorithm Opt at a sensor a, which depends on the corresponding FI at the sensor. Specifically, we have: 
var(Xa(f,Opt)) > 1/Ja{t). Initially, the FI Ja(0) at a sensor a equals the Fisher information in the parameter¬ 
ized family ^a{x — r) with respect to r, and hence is at most the initial accuracy Ca(0) times Aq. In Equation 
IA-31 (see Appendix 0 we show that the gain in accuracy following an interaction is always at least as large the 
corresponding upper bound on the gain in Fisher information as given in Theorem 14.21 divided by the initial 
Fisher-tightness . That is: Ca{t -|- 1) — Ca{t) > + ^)) / Xq. Informally, this property of ALG can be 

interpreted as maximizing the Fisher information flow in each interaction up to an approximation factor of Aq. 
By induction (see proof in Appendix |C]l, we obtain the following. 

Lemma 5.2. At every round t, we have Ca{t) > Ja{t)/A q. 

Lemmas l4~n 15. II and I5.2l can now be combined to yield: var(Xa(f, ALG)) < Aq • var(Xa(f, Opt)). This estab¬ 
lishes Theorem 1 1.21 □ 

Note that if |F| = 0(1) (i.e., F contains a constant number of distributions, independent of the number 
of sensors) then initial Fisher-tightness Aq is a constant, and hence Theorem 11.21 states that ALG is constant- 
competitive at any sensor and at any time. We now aim to identify those cases where ALG performs even better. 
One such case is when the distributions in F as well as the noise distribution N{ri) are all Gaussians. In this 
case Aq = 1 and Theorem 11.21 therefore states that the variance of ALG equals that of Opt, for any sensor at at 
any time. Another case is when |F| is a constant, the noise is Gaussian, but both the population size n and the 
round t go to infinity. In this case, analyzed in Appendix |Dl as time increases, the performances of ALG become 
arbitrarily close to those of Opt. 

6 The Fisher Channel Capacity and convergence times 

For a fixed independent meeting pattern, the FI Jaf) at a sensor a and round t was defined in Section |4~T] with 
respect to algorithm Opt. We note that this definition applies to any algorithm A as long as it is sufficiently 
smooth so that the corresponding Fisher informations are well-defined. This quantity Ja{t, A) would respect the 
same recursive inequality as state in Theorem 14.21 that is, we have: Ja{t + 1, A) < Ja{t, A) H- ^ ^ i . This 

directly implies the following: 

Ja{t + 1) ^) “ Ja{t, A) < Jtv . (4) 

The inequality above sets a bound of Jj^ for the increase in FI per round. In analogy to Channel Capacity as 
defined by Shannon HI we term this upper bound as the Fisher Channel Capacity. 

The restriction on information flow as given by the Fisher Channel Capacity can be translated into lower bounds 
for convergence time of algorithm Opt (and hence also apply for any algorithm). Recall, p is the first round when 
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we have more than half of the population satisfying var(X(j(f)) < e^. By Lemma l4~n a sensor, a, with variance 
smaller than must have a large FI, specifically, Jq (p) > 1/e^. To get some intuition on the convergence time, 
assume that the number of sensors is odd, and let Jq denote the median initial FI of sensors (this is the median of 
the FI, over all sensors a), and assume Jq <C 1/e^- By definition, more than a half of the population have 
initial Fisher information at most Jq. By the Pigeon-hole principle, at least one sensor has an FI of, at most, Jq at 
t = 0 and, at least, 1/e^ att = p. Theorem ll.l[ follows by the fact that, by Equation^ this sensor could increase 
its FI by, at most, Jn in each observation. 

7 Conclusion 

We provide a fresh approach to the study of clock synchronization, following a purely theoretic distributed al¬ 
gorithmic type of study and employing techniques from information theory. We have focused on arbitrary, yet 
independent, meeting patterns, and demanded the performances of each sensor to be as high as possible at any 
point in the execution. We have established lower bounds on the performances of algorithm Opt, the best possi¬ 
ble clock synchronization algorithm operating under the most liberal version of our model. We have identified 
algorifhm ALG, an exfremely simple algorifhm whose performances are highly-compefifive wifh fhose of Opt. 
Moreover, under Gaussian conditions, the accuracies of sensors under ALG precisely equal those of Opt. 

Algorithm ALG is based on storing and communicating a single accuracy parameter that complements noisy 
deviation measurements, and on internal computations and update rules that are based on weighted-average 
operations. Our proofs rely on an extensive use of the concept of Fisher information. We use the Cramer-Rao 
bound and our definition of a Fisher Channel Capacity to quantify information flows and to obtain lower bounds 
on best possible performance. This opens the door for further rigorous quantifications of information flows within 
collaborative sensors. 

Our information theoretic approach allowed us to tackle the clock synchronization problem in dynamic net¬ 
works. In this initial work, we focus on independent meeting patterns which can be considered as representing 
short times scales in highly dynamic scenarios. As evident by this paper, studying independent meeting patterns 
is already rather complex. Hence, we leave the study of dependant patterns to future work. Our hope is that 
studying such extreme dynamic cases will help to provide tools and insights for future work dealing with other 
dynamic scenarios. 

This work is further relevant to the problem of collective approximation of environmental values by biological 
groups ||22]| . 
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APPENDIX 


A Extending the Fisher inequality 

The Fisher information inequality ll42]| (see also ||5l|36l|47l) applies for three one-variable distribution families 
r(z), pi{xi), and ^ 2 ( 2 ^ 2 ) parameterized by p such that r is a convolution of pi and p 2 , that is, r{z) = f pi(z — 
t) ■ p 2 {t)dt. The theorem gives an upper bound of the Fisher information of the family r{z — p) (with respect 
to p) based on the Fisher information and of the families pi{xi — p), and P 2 {x 2 — p), respectively. 
Specifically, the theorem states that: (ai + a 2 )'^J^ < af for any two real numbers ai and 0 : 2 ■ This 

in particular implies that 1 / > 1 / Tpj + 1/■ 

The following lemma extends the Fisher information inequality to the case where the distributions pi and r 
are composed of multiple, not necessarily independent, variables, where the convolution with p 2 takes place over 
one of the variables of pi. 

Lemma A.l. Let {pi{xi — r, X 3 )}rGR and {^ 2 ( 3:2 — 'r)}rGR be two pdf families with a translation parameter r 
such that xi and X 2 are real variables, x^ a vector of multiple real valued variables and and 

dp 2 [x 2 -T) corresponding Fisher information with respect to r. Let r(z — r, X 3 ) = f Pi(t — r, X 3 ) ■p 2 (z — 

t)dt be the convolution ofpi and p 2 - Then the Fisher information in the family {r{z — r, X 3 )}rGR with respect 
to T satisfies: 

1 1 1 

- > - ^ - . 

Jr _ ~ 

r{z—T,X 3 ) pi{xi—T,X 3 ) P 2 {x 2 —r) 


Proof We start by using the definition of r as a convolution over p 2 and the first variable of pi : 

r{z - T, X3) = jPi{t-T, X3) ■ P 2 {z - t)dt. 

We can insert the density function p{x3) to rewrite the right hand side as: 

J Pi{t- r\x3) ■ p{x3) ■ p2{z - t)dt 
= p{xf) J Pl{t - t\x3) ■ P 2 {z - t)dt. 


Implying that: 


r{z - 

We now define fhe disfribufions R{z) 
becomes: 


X\X3) = J Pl{t- t\x 3) ■ P2{z - 
= r{z — t\x 3 ) and Pi{t) = pi{t 


t)dt. 

— t\x3) so fhaf fhe previous equafion 


R{z) = J Pi{t) ■P 2 {z- t)dt, 

for which we apply fhe original Lemma as firsl proved by Siam ll^ fo deduce fhaf for any fwo real numbers ai 
and a 2 , we have: 


(ai + a2)^T^( 


z-p) — 


< ai 


jU 

'^PPx^-p) 


+ / V 

2 p2(x2-p) 


Note fhaf is well defined since, for a given X3, Pi{xi — p) is proporfional fo pi{xi — p, X3) and fhe 

Fisher information infegral of pi{xi — p,X3) converges when infegrafing over all possible values of X3. This 


1 





implies (see |[36l ) that the Fisher information in the convolution R{z — fj.) is well defined and the equation above 
holds. We now multiply both sides of the equation by p{xs) and integrate over x^, to obtain: 

(ai + a 2 f j p{x 3 ) dx3 < of J Pixz) dx^ + (4 J pfe) dx3. (A-1) 

Plugging in the definitions for Fisher information and R{z), the integral on the left hand side becomes: 


Jr{z-^) Pix3)dX3 = I Ji!^,_p_,\^^fix3)dX3 


1 


r{z- p- t\x3) 
1 


d[r{z - p- t\x3)p{x3)]'' ^ 


r{z- p- t\x3)p{x3) 


dp 


dz dx3 


1 


f dr{z - p-T, X3) 


r(z- p- t,X 3 ) V 

1 


dp 


/ dr{z - p- t,X3) 


r{z- p- t,X3) V 


dr 


dz dx3 


dz dx3 


1 f dr{z — T, X3) 


r{z-T, X3) 


dr 


dz dx3 


= J, 


r(2:—r,X3) ’ 


where we used z = z — p and the fact that X 3 is independent of r. 

Similarly, the integral over the first term on the right hand side of Equation I A-1 1 gives Jp^(^xi-T xs)- 
term is: 

/ JMx 2 -tz)P(^ 3 )dX 3 = Jp,^„) I p{x3)dX3 = = JU- 2 -rP 

by normalization of the distribution X 3 . 

Finally, Equation IA-1 [ translates to: 

(cki + CX 2 ) "^rfz—r.ia) — (a;i —t,X3) ' dp2(x2—T)i 

for any real ai and a 2 . Setting ai = "2 = J4 {xi-t,x3 )’ finally obtain: 

1 1 1 

- >-^-, 

'^r{z-T,X3) '^pi{xi-T,X3) '^P2{x2-r) 

as desired. □ 


B Proof of Lemma 15.11 

Eix an independent meeting pattern V and a pdf assignment, <ha £ for each sensor a. Eet us now prove the 
first part of the lemma, namely, that the opinion Xa (t) at any sensor a and round t serves as an unbiased estimator 
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for T* . The claim holds at time zero, and assume by induction that it holds at round t. Now consider the case that 
sensor a observes another sensor b at round t. The opinion of u after the interaction, becomes: 


Xa{t + 1) 


Xa{t) + 


dab(t') ■ Cl){€) 
Ca{t) + Cb{t) 


Xa{t)Ca{t) +Xb{t)cb{t) T] • Cb{t) 

Ca{t) + Cb{t) Ca{t)+Cb{t)' 


(A-2) 


By induction, Xa{t) and Xb{t) are both unbiased estimators of r*. Recall now that the noise distribution N{ri) 
is centered around zero. Moreover, observe that at round t, the accuracy at each sensor a, namely Ca{t), is 
deterministically defined (given the fixed pattern of meetings, and the assignment of pdf’s to the sensors). In 
particular, at round t, the accuracy s Ca{t), Cb{t) as well as Cb{t) are all fixed consfants. Equation IA-21 therefore 
implies the following. 


Claim B.l. At any round t and for any sensor a, the opinion Xa{t) serves as an unbiased estimator ofr*. 


Claim I bTT] established the first part of the lemma. Let us now turn to prove the second part. This part of the 
lemma holds for time f = 0 by definition of Ca(0). Assume by induction that for any sensor a at round t it holds 
that Cq (f) = 1 /var (Xq (f)) and consider time f +1. We now consider an interaction between two sensors at round 
t, in which sensor a observes sensor b. The variance of the new opinion of o is: 


var(Xa(f + 1)) 


var 



dabCbjt) \ _ f Xa{t)Ca{t) + Cb{t){Xb{t) + Tj) 

Ca{t) + Cb{t) J V Ca(t) + Cb(t) 


cl{t) ■ war{Xa{t)) + cl{t) ■ var(Xfe(t) + rj) 

{Ca{t) + Cb{t)Y 

cl{t) • vw{Xa(t)) + cl{t) • (var(Xb(t)) + var(N(p))) 

(Ca(t) + Cb(f))2 

cljt) • l/cqjt) + cf(t) • il/cbjt) + var(X(7?))) ^ cg(t) • l/ca(f) + cg(t) • l/cb{t) 

{Ca{t) + Cb{f)Y {Ca{t) + Cb{f)Y 

Cqjt) + Cb{t) ^ 1 ^ 1 

{Ca{t) + Cb{t)y {Ca{t) + Cb{t)) Ca(t + 1) ’ 


which proves the induction step. This complete the proof of Lemma l5T] 


C Proof of Lemma 15.21 

By Lemma [5d1 and the definition of Aq, we have Ca(0) > Ja(0 )/Aq, and hence Lemma [SA] holds at time 0. 

Assume by induction that the lemma holds at round t and consider an interaction at round t when sensor a 
observes sensor b. Let cn = l/var(A(r/)). By definition of algorithm ALG, we have: 

1 


Ca{t + 1 ) - Ca{t) = Obit) = 

By the induction hypothesis applied on sensor b, we have: 

1 1 

> 


l/cb(f) + 1 /cn' 


1/Cb{t) + 1/CN ^ + 1/CN Ao + ^ 

Again, by definition of Aq, we have Aq > Jn/cn- Hence: 

Ca{t + 1) - Ca{t) > - 

^0 


(A-3) 


iii 



























This means that the gain in accuracy at sensor a following an observation of sensor b is up to a multiplicative 
factor of Aq at least as large the corresponding gain in FI of the sensor (operating under Opt). 

Finally, applying the induction hypothesis for sensor a at round t, we have Ca{t) > Ja{t)/ Aq. Plugging this 
in Equation lA-31 we obtain: 


Ca{t + 1) > ■ ( Ja{t) H-j- — ] > Ja{t + 1)/Aq, 

^0 V + 

where the second inequality holds by Theorem 14. 2 1 This completes the proof of the lemma. 

D On the performances of ALG at large times 

We now investigate the performances of algorithm ALG at large times, and show that as time increases, the 
variance of ALG becomes arbitrarily close to zero, and moreover, the performances of ALG become closer and 
closer to those of Opt. 

The depth 0(7^) of a. given independent meeting pattern V is defined as the largest round t for which some 
sensor observes another sensor. For simplicity, we assume synchronous meeting patterns in which at each round 
each sensor observes another sensor, but our results can be easily extended to the case where the number of total 
interactions per sensor are all roughly the depth D{V). Note that for any population with n sensors, the depth of 
an independent synchronous meeting pattern is at most log 2 n. In particular, the depth is finite for populations of 
a fixed size. Since our goal is fo invesfigafe fhe behavior of ALG af large limes, whenever we consider a round t, 
we only inspecf populafions and corresponding meeling palferns for which fhe depfh is al leasf t. 

In fhe remaining of Ihis section we fix a family of disfribufions fF and a noise disfribulion N{r]). Given a 
round t, lef varsup(f) denofe fhe supremum of var(Aa(f, ALG)), faken over (1) all possible populations An = 
{oi, 02 , • • • , On}, for n = 1, 2 • • •, (2) all assignmenfs of disfribufions G fo fhe sensors in An, (3) all 
meeting patterns (wifh depfh al least t), and (4) all sensors Oj G An- Our next claim implies that as time 
increases, the variance of ALG becomes arbitrarily close to zero. 

Claim D.l. linit^oo varsup(f) = 0. 

Proof. For a round t, let C'inf(t) = l/varsup(f). Note that varsup(O) is precisely the maximal variance over the 
distributions in F. Hence, Cinf (0) is some positive constant (that depends on F only). 

By the definition of C'inf(t) and by Lemma [5Hl it follows that C'inf(i) is the infimiim of Ca{t), the accuracy 
of a sensor a at round t operating under algorithm ALG, taken over all possible populations, all assignments of 
distributions G to sensors, all meeting patterns, and all sensors a. 

When sensor a observes sensor b at round t, the gain in accuracy for sensor a is: 

1 1 r / N 1 . 

—;—- > — • mini Chit), - ;—r- 

l/cb{t) + VN 2 ^ var(A(?7))^ 

It follows that C'inf(f) increases in a single round by either at least a multiplicative factor of 3/2 or by at least an 
additive constant factor of l/2var(A(?7)). This implies that limt_,.oo C'inf(t) = oo. The proof of the claim now 
follows by the definition of C'inf(t). □ 

Since algorithm Opt is superior over algorithm ALG, the same limit property of the variance applies to 
algorithm Opt as well. We now claim that, in fact, if the noise N (rj) is Gaussian, then the variances in ALG and 
Opt go to zero at roughly the same speed. 
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Given a round t, let K{t) denote the supremum of the fraction var(Xa {t, ALG))/var(X(j(f, Opt)), taken over 
all possible populations {An}^=i, all assignments of distributions G to sensors a in An, all meeting 
patterns, and all sensors a G An- Note that Theorem [L2] implies that for any t, we have K{t) < Aq. 

Lemma D.2. If the noise N{r]) is Gaussian then limt_>.oo ^(t) = 1. 


Proof. Since the noise is Gaussian we have var(A(r/)) = I/Jat. Recall the definition of C'inf(f) from the proof 
of Claim IdTT] Note now that as C'inf(f) becomes larger and larger the gain in accuracy under algorithm ALG 
becomes very close to Jn- Indeed, when sensor a observes sensor b at round t, we have: 


Ca{t + 1) - Ca{t) 


1 

1/C6(f) + 1/Jn 


Specifically, consider now the case that Cf,{t) > x ■ Jj^, for some large x. Here, the increase in accuracy at o is 
some quantity AJ(f, ALG), satisfying 

———Jat < AJ(f,ALG) < Jm- 
1 + 1 /x 

The Cramer-Rao bound and Lemma [5T] imply that J6(f, Opt) > Cfe(f), and hence, Jb(f, Opt) > x ■ Jn- This, 
together with Theorem 14.21 implies that at round t, the increase AJ(t, Opt) in Fisher information of a under 
algorithm Opt is some quantity satisfying 

——— Jat < AJ(t, Opt) < Jn- 
1 + 1/x 


Hence AJ{t, ALG) and A J(t, Opt) are the same quantity up to a multiplicative factor of Finally, since 

limt_>.oo C'inf(f) = oo (see the proof of Claim I dTT]) . it follows that x goes to infinity as t goes to infinity. We thus 
get limt_).oo K{t) = 1, which establishes the proof of the lemma. □ 
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